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Abstract 

We take up optimality results for robust Kalman filtering from Ruckdeschel (2001, 2010) 
where robustness is understood in a distributional sense, i.e.; we enlarge the distribu- 
tion assumptions made in the ideal model by suitable neighborhoods, allowing for out- 
liers which in our context may be system-endogenous/propagating or -exogenous/non- 
propagating, inducing the somewhat conflicting goals of tracking and attenuation. Cor- 
respondingly, the cited references provide optimally-robust procedures to deal with each 
type of outliers separately, but in case of IO-robustness does not say much about the 
implementation. We discuss this in more detail in this paper. Most importantly, we 
define a hybrid filter combining AO- and IO-optimal ones, which is able to treat both 
types of outliers simultaneously, albeit with a certain delay. We check our filters at a 
reference state space model, and compare the results with those obtained by the ACM 
filter Masreliez and Martin (1977), Martin (1979) and non-parametric, repeated-median 
based filters Fried et al. (2006), Fried et al. (2007). 

Keywords: robustness, Kalman Filter, innovation outlier, additive outlier, minimax 
robustness; 
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1. Introduction 

Robustness is an "old" problem in Kalman filtering, with first (non-verified) hits on 
a quick search for "robust Kalman filter" on scholar.google.com as early as 1962, i.e.; 
even before the seminal Hubcr (1964) paper, often referred to as birthday of Robust 
Statistics. 

The amount of literature on this topic is huge, and we do not attempt to give a com- 
prehensive account here. Instead, we refer the reader to the excellent surveys given in Er- 
shov and Lipster (1978), Kassam and Poor (1985), Stockinger and Dutter (1987), Schick 
and Mitter (1994), Kiinsch (2001), and to some extent Ruckdeschel (2001, Sect. 1.5). 
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The mere notion of robustness in the context of filtering is non-standard, most fre- 
quently it is used to describe stability of the procedure w.r.t. certain variations of the 
"input parameters" . The choice of the "input parameters" to look at varies from notion 
to notion. In this paper we are concerned with (distributional) minimax robustness; 

1. e.; we allow for deviations from the ideal distributional model assumptions, defining 
suitable neighborhoods about this ideal model. For these neighborhoods, we consider 
procedures minimizing the maximal predictive inaccuracy on these neighborhoods, mea- 
sured in terms of mean squared error (MSE) . 

These minimax filters derived in Ruckdeschel (2010) come as closed-form saddle- 
points consisting of an optimally-robust procedure and a corresponding least favorable 
outlier situation. Their optimality holds in a surprisingly general setup of state space 
models, which is not limited to a Euclidean or time-discrete framework. 

The important condition is that MSE makes sense for the range of the states, so these 
results cover general Hidden Markov Models for arbitrary observation space, dynamic 
(generalized) linear models as discussed in West et al. (1985) and West and Harrison 
(1989), as well as continuous time settings, as those used in applications of Mathemat- 
ical Finance, compare, e.g. Nielsen et al. (2000) and Singer (2002) for respective model 
formulations. For the application, we need to linearize the corresponding functions tran- 
sition and observation functions suitably, e.g. to give the (continuous-discrete) Extended 
Kalman Filter (EKF). We even cover applications such as optimal portfolio selection, 
with corresponding controls U t (i.e.; buying and selling operations) and, in principle, 
indirectly observed random fields. 

In this paper though, to clarify ideas we limit ourselves to the linear, Euclidean, time 
discrete state space model (SSM), and in the ideal model assume normality. More details 
on these models can be found in many textbooks, cf. e.g. Anderson and Moore (1979), 
Harvey (1991), Hamilton (1993), and Durbin and Koopman (2001). 

2. Setup 

2.1. Ideal model 

Let us start with some definitions and assumptions. Our SSM consists in an unobserv- 
able p-dimensional state X t evolving according to a possibly time-inhomogeneous vector 
autoregressive model of order 1 (VAR(l)) with innovations v t and transition matrices F t , 
i.e., 

X t =F t X t _ 1 +v t (1) 

We only observe a q-dimensional linear transformation Y t of X t involving an additional 
observation error e t , 

Y t = Z t X t + e t (2) 
In the ideal model we work in a Gaussian context, that is we assume 

V n ~ A/- p (0,g t ), e t ind ~ P Af q (0,V t ), X ~ M p (a ,Q ), (3) 
{X ,v s ,e t , s,t £ N} stochastically independent (4) 

For this paper, we assume the hyper-parameters F t , Z t ,Qt,V t , ao to be known. 
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2.2. Deviations from the ideal model 

As announced, these ideal model assumptions for robustness considerations are ex- 
tended by allowing (small) deviations, most prominently generated by outliers. In our 
notation, suffix "id" indicates the ideal setting, "di" the distorting (contaminating) sit- 
uation, "re" the realistic, contaminated situation. 

AO's and IO's. We follow the terminology of Fox (1972), who distinguishes innovation 
outliers (or IO's) and additive outliers (or AO's). Historically, AO's and IO's denote gross 
errors affecting the observation errors and the innovations, respectively. For consistency 
with literature, we stick to this distinction, but rather use these terms in a wider sense: 
IO's stand for general endogenous outliers entering the state equation, hence with prop- 
agated distortion, also covering level shifts or linear trends which would not be included 
in the original definition. Similarly wide-sense ^40's denote general exogenous outliers 
which do not propagate, like substitutive outliers or SO's as defined in equations (18)- 
(20). 

Different and competing goals induced by AO's and IO's. Due to their different nature, 
as a rule, a different reaction in the presence of IO's and AO's is required. As AO's are 
exogenous, we would like to ignore them as far as possible, damping their effect, while 
when there are IO's, something has happened in the system, so the usual goal will be to 
detect these structural changes as fast as possible. 

A situation where both AO's and IO's may occur is more difficult: We are faced with 
an identification problem, as we cannot distinguish 10 from AO type immediately after 
a suspicious observation; hence a simultaneous treatment is only possible with a certain 
delay — see section 4. 

Another task, not pursued further in this paper, consists in recovering the situation 
without structural changes in an off-line situation. An example is spectral analysis of 
inter-individual heart frequency spectra, which requires a cleaning from both (wide-sense) 
IO's and AO's; after this cleaning the powerful instruments of spectral analysis will be 
available; cf. Spangl (2008). 

3. Kalman filter and robust alternatives 

3.1. Classical Method: Kalman-Filter 

Filter Problem. The most important problem in SSM formulation is the reconstruction 
of the unobservable states X t by means of the observations Y t . For abbreviation let us 
denote 

Y 1:t = (Y u ...,Y t ), y 1:O :=0 (5) 
Using MSE risk, the optimal reconstruction is the solution to 

I 1 2 

E | At — ft\ = niin/ t , f t measurable w.r.t. a(Yi- s ) (6) 

We focus on filtering (s — t) in this paper, while s < t makes for a prediction, and s > t 
for a smoothing problem. 
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Kalman-Filter. The general solution to (6), the corresponding conditional expectation 
E[X t |Yi :s ] in general is rather expensive to compute. Hence as in the Gauss-Markov 
setting, restriction to linear filters is a common way out. In this context, Kalman (1960) 
introduced a recursive scheme to compute this optimal linear filter reproduced here for 
later reference: 



Initialization: 
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where S t i t = Cov(X t — X t i t ), E t i t _ 1 = Cov{X t — X t \t-\), and M t ° is the so-called Kalman 
gain. 

The Kalman filter has a clear-cut structure with an initialization, a prediction, and 
a correction step. Evaluation and interpretation is easy, as all steps are linear. The 
strict recursivity / Markovian structure of the state equation allows one to concentrate 
all information from the past useful for the future in X t \ t -i- 

This linearity is also the reason for its non-robustness, as observations y enter unbounded 
into the correction step. A good robustification has to be bounded in the observations, 
otherwise preserving the advantages of the Kalman filer as far as possible. 



3. 2. The rLS as optimally robust filter 

The idea of the procedures we discuss in this paper are based on robustifying recursive 
Least Squares: rLS. Let us begin with (wide-sense) AO's. 

With only AO's, there is no need for robustification in the initialization and prediction 
step, as no (new) observations enter. For the correction step, we use the following 
robustification, compare Ruckdeschel (2000): Instead of M°AY, we use a Huberization 
of this correction 

H b (M°AY) = M°Armin{l,fo/|M°Ay|} (10) 

for some suitably chosen clipping height b. While this is a bounded substitute for the 
correction step in the classical Kalman filter, it still remains reasonably simple, is non 
iterative and hence especially useful for online-purposes. 

However it should be noted that, departing from the Kalman filter and at the same 
time insisting on strict recursivity, we possibly exclude "better" non-recursive procedures. 
These procedures on the other hand would be much more expensive to compute. 

Remark 3.1. | • | in expression |M°Ay| denotes the Euclidean norm of K 9 ; instead, 
however you could also use other norms like Mahalanobis-type ones. The choice of a 
quadratic-form-type norm for H b does not affect the optimality statements of Theo- 
rem 3.2 below, provided that the same norm be used in the MSE. 

Choice of the clipping height b. For the choice of b, we have two proposals. Both are 
based on the simplifying assumption that E id [AA|Ay] is linear, which in fact turns out to 
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only be approximately correct. The first one chooses b = b(S) according to an Anscombe 
criterion, 

E id \AX - H b (M°AY)\ 2 = (1 + S) E id \AX - M°AF| 2 (11) 

where 5 may be interpreted as "insurance premium" to be paid in terms of efficiency. 

The second criterion uses the radius r € [0, 1] of the neighborhood W°(r) (defined in 
(20)) and determines b = b(r) such that 

{l-r)E id (\M°AY\-b)+=rb (12) 

This will produce the minimax-MSE procedure for W°(r) according to Theorem 3.2 
below. 

If r is unknown, which is almost always the case, we translate an idea worked out in 
Ricder et al. (2008) into our setting: Assume we know that r e [n,r u ], < r; < r u < 1. 
Then we define a least favorable radius ro such that the procedure rLS(6(ro)) with clipping 
height b = b(r ) minimizes the maximal inefficiency — w.r.t. the procedure knowing the 
respective radius — among all procedures rLS(6(r)), i.e.; each rLS for some clipping height 
b(r) 7^ b(ro) has a larger inefficiency for some r' € [ri,r u ]. Radius ro can be computed 
quite effectively by a bisection method: Let 



A r = E id 



trCov id [AA|Ay id ] + (|M°AF' d | - b{r))\ (13) 

| M A yid|2 _ (| M A yid| _ & ( r ))2_l + b ^2 (14) 



Then r solves 

A ro /A n = B r JB ru (15) 

compare Ruckdeschel (2010, Lemma 3.1). 

From another point of view, for given b, you may interpret criteria (11) and (12) 
the other way round, giving the efficiency loss in the ideal model, or the size of the SO 
neighborhood, for which the corresponding procedure is MSE-minimax. 

(One-Step)-Optimality of the rLS. The rLS filter is optimally-robust in some sense: To 
see this, in a first step we boil down our SSM to the following simplified model 1 . We 
have an unobservable but interesting state X ~ P x (dx), where for technical reasons we 
assume that in the ideal model E \X\ 2 < oo. Instead of X we rather observe the sum 

Y = X + e (16) 

of X and a stochastically independent error e. We assume that in the ideal model, the 
conditional distribution of Y given X must allow for densities w.r.t. some measure fj,, 
i.e.; 

P Y \ x=x (dy) =p e {y-x)ii(dy) P x '(<fcc)-a.e. (17) 

As (wide-sense) AO model, we consider an SO outlier model already used by Birmiwal 
and Shen (1993) and Birmiwal and Papantoni-Kazakos (1994) 

Y'" = (1 - U)Y id + UY dl , U - Bin(l,r) (18) 



1 Instead of this simplification, we could even use a more general "Baycsian" model as simplification, 
compare Ruckdeschel (2010, section 3.2). 
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for SO-contamination radius < r < 1 specifying the size of the corresponding neighbor- 
hood, i.e.; the probability for an SO. U is assumed independent of (X, Y id ) and (X, Y di ) 
as well as 

y di , X independent (19) 

where £(Y di ) is arbitrary, unknown and uncontrollable (a.u.u.). The corresponding 
neighborhood is defined as 

U so {r) = (J {C(X,Y' C ) | F ro acc. to (18) and (19) with radius sj (20) 

0<s<r 

With this setting we may formulate two typical robust optimization problems: 

Minimax-SO problem. Minimize the maximal MSE on an SO-neighborhood, i.e.; find a 
measurable reconstruction f for X s.t. 

max u E re \X - f(Y")\ 2 = mm f ! (21) 

Lemma5-SO problem. In the spirit of Hampel (1968, Lemma 5), minimize the MSE in the 
ideal model but subject to a bound on the bias to be fulfilled on the whole neighborhood, 
i.e.; find a measurable reconstruction / for X s.t. 

E ld \X - f(Y id )\ 2 = min/ ! s.t. sup w | E ro /(r ic ) - EX\ < b (22) 

The solution to both problems can be summarized as 

Theorem 3.2 (Minimax-SO, Lemma5-SO). (1) In this situation, there is a saddle- 
point (/o.Po" 3 ') for Problem (21) 

f (y) := EX + D{y)w r {D{y)) 1 w r {z) = min{l, p/\z\} (23) 
Pf\dy) ■= ^{\D{y)\/p-l) + P Yi \dy) (24) 

where p > ensures that j P^ d ' ' (dy) = 1 and D{y) = E id LY|F = y] — EX. 

(2) fo from (23) also is the solution to Problem (22) for b = p/r. 

(3) If~E iA \X\Y] is linear in Y , i.e.; E id [X|y] = MY for some matrix M, then neces- 
sarily 

M — M° — Cov(X, Y) Var Y~ (25) 

or in SSM formulation: M° is just the classical Kalman gain and f the (one-step) 
rLS. 

The proof to this theorem is given in Ruckdeschel (2010, Thm. 3.2). 

Model (16) already covers our normal SSM model: we only have to identify X in 
model (17) with AX t and replace p £ (y - x) p(dy) with Af(Z t AX t , V t )(dy). If even AX t 
is normal, (3) applies and rLS is SO-optimal. 

Remark 3.3. (a) The ACM filter by Masreliez and Martin (1977), which is a com- 
petitor in this study, by analogy applies Huber (1964) 's minimax variance result to the 
"random location parameter X" setting of (16). They come up with redescenders as 
filter /. Hence the ACM filter is not so much vulnerable in the extreme tails but rather 
where the corresponding ip function takes its maximum in absolute value. Care has to 
be taken, as such "inliers" producing the least favorable situation for the ACM are much 
harder to detect on naive data inspection, in particular in higher dimensions. 
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(b) For exact SO-optimality of the rLS-filter, linearity of the ideal conditional expecta- 
tion is crucial. However, one can show Ruckdeschel (2010, Prop. 3.6) that E id [AA|AF] 
is linear iff AX is normal, but, having used the rLS-filter in the AA-past, normality 
cannot hold, compare (Ruckdeschel, 2010, Prop 3.4). 

Back in the AX Model for t > 1: eSC ^Neighborhoods. As noted in the last remark, 
rLS fails to be SO-optimal for t > 1. Nevertheless rLS performs quite well at both 
simulations and real data. One explanation for this is to consider yet an extension of 
the original SO-neighborhoods — the extended SO or eSO-model (compare Ruckdeschel 
(2010, Sect. 3.4)), where we allow X to be suitably corrupted as well. In fact, the 
optimality of pair (fo,Po ) from Theorem 3.2 extended to (fo,Po ' ® Pq ) f° r anv 
P* with E di |A di | 2 = G, remains a saddle-point in the corresponding Minimax-Problem 
on the eSO-neighborhood of same radius, cf. (Ruckdeschel, 2010, Thm. 3.10). As a 
consequence, (compare (Ruckdeschel, 2010, Prop. 3.11)), in the Gaussian setup, instead 
of focussing on the (SO-) saddle-point solution to an W(r)-neighborhood around C(AX) 
stemming from an rLS-past, we use the following coupling- type idea: 

We assume that for each time t, there is a (fictive) random variable AX^ ~ A/" p (0, E) 
such that AA t rLS stemming from an rLS-past can be considered a contaminating X di in 
the corresponding eSO-neighborhood around AX*' with radius r. From this perspective, 
the rLS is exactly minimax for each time t. 

3.3. IO- optimality 

As noted, in the presence of IO's, we want to follow an IO outlier as fast as possible. 
The Kalman filter in this situation does not behave as bad as in the AO situation, but 
still tends to be too inert. To improve upon this, let us go back to (16) which reveals a 
useful symmetry of X and e: Apparently 



Hence we follow Y more closely if we damp estimation of s, for which we use the rLS- 
filer. We should note that doing so, we rely on "clean", i.e., ideally distributed errors e. 
With the obvious replacements, Theorem 3.2 translates word by word to a corresponding 
minimax Theorem for IO's, compare Ruckdeschel (2010, Thm. 4.1). 

rLS.IO. In analogy to the definition of the rLS in equation (10), we set up an IO- 
robust version of the rLS as follows: We retain the initialization and prediction step 
of the classical Kalman filter and, assuming Z t invertible for the moment, replace the 
correction step by 



where the same arguments for the choice of the norm and the clipping height apply as 
for the AO-robust version of the rLS. 

To better distinguish IO- and AO-robust filters, let us call the IO-robust version 
rLS.IO and (for distinction) the AO-robust filter rLS. AO in the sequel. 



E[X\Y] = Y-E[e\Y] 



(26) 



X t \ t = X t \ t _i + Z^[AY t - H b ((l q - Z t M?)AY t )] 



(27) 
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Invertibility problem. Back in the (linear, discrete-time, Euclidean) SSM the approach 
just described faces the problem that in general matrix Z t will not be invertible, so 
we cannot reconstruct X injectively from Y and e. Under a certain full-rank con- 
dition, this problem can be solved by passing to corresponding rLS-type smoothers. 
The assumption we need is a version of complete constructibility, cf. Anderson and 
Moore (1979, Appendix) adopted to the time-inhomogeneous case: Denoting the product 
F t+p F t+p _i ■ . . . ■ F t by F t+p . t we assume that for each t, F t+p _ 1:t (M. p ) is contained in 
[ZJ, Ff ZZ +1 , Ff +1 . t ZZ +2 , J F t T +p _ 1:t Z t T +p _ 1 ](K«). Details will be given in a subsequent 
paper. 

Remark 3.4. (a) It is worth noting that also our IO-robust version is a filter, hence 
does not use information of observations made after the state to reconstruct; rLS.IO is 
strictly recursive and non iterative, hence well-suited for online applications, 
(b) An alias to rLS.IO could be "hysteric filter" as it hysterically follows any changes 
in the Y's. 

4. Simultaneous Treatment of AO's and IO's 

This section is less backed by theoretical results than the ones on pure AO or 10 
situation; rather it proposes a heuristic to achieve both types of outlier robustnesses. 
As already mentioned, simultaneous treatment of AO's and IO's is only possible with a 
certain delay. With this delay, we can base our decision of whether there was an AO or 
an 10 on the size of subsequent |AY" t |'s — if there was an AO this should result in only 
one "large" \AY t \ in a row, whereas in case of an 10 due to propagation, there should 
be a whole sequence of large |AF t |'s. So a hybrid filter (called rLS.IOAO for simplicity) 
could be designed as follows: 

To a given delay window width w, we run in parallel rLS.AO and rLS.IO (but only 
store the last w values of rLS.IO). By default we return the rLS.AO values. When- 
ever there is a run of w "large" |AY" t rLS AO |'s we replace the last w filter values by the 
corresponding rLS.IO values and use these ones to continue with the rLS.AO. 

Apparently rLS.IOAO gets into trouble in windows where we have both IO's and 
AO's; here some modeling of the type of structural change (local constant, or better: 
local linear) as in Fried et al. (2007) should be helpful; we do not treat this in this paper, 
though. 

Operationalization. In the ideal (Gaussian) model, the AYj's should be independent, so 
a reasonable decision on whether a sequence of | AF t rLSAO |'s is "large" could be based 
on corresponding quantilcs of |AY t rLSAO |, in the ideal model, or, somewhat easier, of 
AY/A^AYt which, assuming C(AY t ) to be approximately normal, is approximately 
X^(0)-distributed. Relaxing this condition a little, we already switch to rLS.IO when a 
high percentage h of the w |AF t |'s are larger than this given quantilc. 

This leaves us to determine several tuning parameters: window-width w (this should 
really be chosen according to the application; we have obtained good results in our 
examples with w = 5), the clipping heights for rLS.IO and rLS.AO (proposal: ac- 
cording to (versions of) (11) or (12)), the percentage h (default: 80% of the last w 
instances of |AY t rLS AO |), and the corresponding quantilc (default 99%) assuming that 
AY* <~ Af q (0,A t ). 
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Remark 4.1. (a) Note that although the decision whether we issue the rLS.IO or 
the rLS.AO values is made w observations after the state which is to be reconstructed, 
we still only use filters, hence the information of Y t+ j, j = l,...,w — 1 is not used 
to improve the reconstruction so far, as this would involve corresponding (yet-to-be- 
robustified) smoothers. Once the corresponding work on robust smoothing will be done, 
we could surely use this additional information. 

(b) In a time-invariant linear SSM (i.e., with hypcrparameters F, Z, Q, and V constant 
in time), in general A t will converge in t exponentially fast — also if one uses bounded rLS- 
steps — so these tuning parameters will only have to be determined for a small number 
of time instances t, compare Ruckdcschcl (2001, chap. 7). In fact, setting them time- 
invariant right from the beginning will often do a reasonable job already. 

5. Simulation Example: Steady State Model 

Our running example will be a one-dimensional steady state model with hyper- 
parameters 

p = q=l, F t = Z t = l, in the ideal model: v t ,e t '~' W(0, 1) (28) 

We consider performance of classical Kalman filter, rLS.AO, rLS.IO, and rLS.IOAO in 
this model and under AO's and IO's. More specifically, we have generated deterministic 
AO's in observations 10,15,23, and IO's in observations 20-25 (a local linear trend) and 
37-42 (level shift). 

As competitors, we include the ACM filter by Martin (1979) implemented by B. Spangl 
in R package robKalman, and a variant hybr PRMH of robf ilter, cf. Fried and Schettlinger 
(2008) concerning its implementation and Fried et al. (2006) regarding its definition, 
which is a non-parametric filter fitting local levels and linear trends based on repeated 
medians. The results are plotted in Figures 1-4, where in the plots, we confine ourselves 
to the rLS-variants, which already makes for five curves to be plotted in one panel. 

In the ideal situation, all filters perform well, with slight advantages for the classical 
Kalman filter (which has smallest theoretical MSE), but closely followed (and in the 
prediction case slightly beaten) by the rLS.IO. 

In the IO situation, the "hysteric" rLS.IO filter performs best, beating the classical 
Kalman filter; both rLS.IOAO and hybr PRMH perform reasonably well, while the AO- 
robust filters ACM and rLS.AO are not able to track the IO at all (as they can only 
perform bounded correction steps) and hence, like a hanging slope, only closely recover 
the changed situation. 

In the AO situation, we have the complementary image; here ACM performs best 
(see also Remark 5.1(a)), but rLS.AO only performs slightly weaker. rLS.IOAO is a 
little worse, and with a certain gap, but still reasonably well follows hybr PRMH , while 
both classical Kalman filter and rLS.IO (the latter even worse) perform drastically bad. 

Finally, in the mixed IO and AO situation, hybr PRMH is by far the best solution, then 
followed with a certain gap by the rLS.IOAO, while all other filters perform unacceptably 
bad. By construction, rLS.IOAO assumes that at every time instance there only can be 
either an AO or an IO. Otherwise the corresponding MSE gets unbounded on every 
neighborhood U(r) for r > 0. Hence the AO in observation 23 really confuses rLS.IOAO 
completely, compare Figures 1-4: it has just switched to "hysteric" IO behavior and 
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ideal situation 





Figure 1: rLS-filter variants in model (28) in the ideal model; in the panel below (note the different 
y-scalc) both actual states (black) and observations (red) are plotted. 



hence faithfully follows the AO. Repeated-median-based hybr PRMH does not have this 
problem, as the median even stays stable under (almost) arbitrary substitutive outliers, 
hence it is able to keep the local linear trend. Omitting observation 23 results in a much 
better performance of rLS.IOAO, which then even beats hybr PRMH , cf. Table 2. 

Averaging over time in one realization of the SSM, we get the "ergodic" empirical 
MSEs as displayed in Tables 1, 2. 

Remark 5.1. (a) We can explain why the ACM filter beats the rLS in the AO- 
situation by the fact that the contamination in this study clearly covers the worst-case 
behavior of the rLS but not the one of the ACM filter, compare Remark 3.3(e), and also 
fails to do so for hybr PRMH . 

(b) rLS.IOAO really has its advantages in higher dimensions where median-based fil- 
ters are much harder to define and get computationally expensive. One might even think 
of combining rLS.IOAO and hybr PRMH in these settings: first let rLS.IOAO do a pre- 
liminary, fast, and dimension-independent cleaning, and then let hybr PRMH polish this 
result coordinate-wise. 

(c) It is still an open question whether we can improve on the rLS.IOAO behavior, 
using the SSM 

Z t = (l,t),X t = {at,bt) r ,Ft=h,Qt = 0.1I 2 , K - 1 



which (up to the specification of error/innovation variance) is essentially the model in the 
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10 situation 





Figure 2: rLS-filter variants in model (28) with IO's: a local linear trend at X20— X25 and a level shift 
for states X37-X42; the panel below (note the different j/-scale) is as in Figure 1. 



empirical MSE 



Situation 


Type 


Kalman 


rLSio 


tLSao 


rLSioAO 


ACM 


hybrpRMH 


ideal 


filter 


0.59 


0.60 


0.75 


1.08 


0.77 


1.41 


pred 


1.69 


1.67 


1.96 


2.26 


2.01 




10 


filter 


1.04 


0.83 


6.54 


1.36 


25.19 


1.36 


pred 


5.28 


4.71 


12.17 


5.42 


32.16 




AO 


filter 


15.25 


30.38 


0.91 


1.16 


0.82 


1.79 


pred 


15.15 


29.68 


2.00 


2.25 


2.05 




IO&AO 


filter 


17.00 


30.52 


12.89 


7.78 


28.76 


1.53 


pred 


21.94 


34.56 


19.23 


13.87 


36.08 





Table f : "ergodic" estimates for the MSE of the variants of the rLS and the ACM and hybrpp^MH m 
the situation described in the text; best results are printed in bold face. 
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AO situation 
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Figure 3: rLS-filter variants in model (28) with AO's in observations 10,15,23; the panel below (note the 
different y-scale) is as in Figure 1. 
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10 & AO situation 





10 20 30 40 50 



Figure 4: rLS-filter variants in model (28) with both IO's as in Figure 2 and AO's as in Figure 3; the 
panel below (note the different y-scale) is as in Figure 1. 
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empirical MSE — without obs. 23 



Situation 


Type 


Kalm 


rLSio 


rLS A o 


rLSioAO 


ACM 


h ybr PRMH 


ideal 


filter 


0.59 


0.60 


0.75 


1.10 


0.78 


1.43 


pred 


1.71 


1.69 


1.99 


2.29 


2.03 




10 


filter 


0.94 


0.74 


6.08 


1.26 


24.48 


1.38 


prcd 


5.59 


4.98 


12.05 


5.73 


31.66 




AO 


filter 


12.46 


24.07 


0.86 


1.15 


0.84 


1.83 


pred 


12.18 


23.05 


1.94 


2.25 


2.10 




IO&AO 


filter 


13.28 


24.21 


11.58 


1.31 


27.93 


1.56 


pred 


17.01 


26.34 


17.80 


5.63 


35.38 





Tabic 2: results as in Table 1, but excluding the values for observation 23, where we had coincidence of 
(wide-sense) IO and AO, a situation not covered in the design of rLS.IOAO. 



background of hybr PRMH . In this setting Z t is not invertible, but the model is completely 
constructible, so passing to smoothers might help. 

Remark 5.2. A careful reader might object that we have not included any real world 
data application. We have done so, because most data sets we have analyzed come 
with unkown hyper-parameters. The setting of this paper assumes knowledge of these 
hyper-parameters, so we would have to estimate them somehow — possibly by the EM 
algorithm of Shumway and Stoffer (1982) or again by more recent refinements. But, since 
the robustness properties of combining a non-robust M-step (i.e., maximum likelihood) 
and a robust E-step (achieved using the procedures of this paper) are not evident, and a 
corresponding robustification of the M-step would have been out of scope, we have rather 
confined us to simulations. 

6. Implementation: R-package robKalman 

rLS.AO was originally implemented to XploRe, compare Ruckdeschel (2000). In an 
ongoing project with Bernhard Spangl, BOKU, Vienna, and Irina Ursachi (ITWM), we 
are about to implement all the rLS filter to R, see R Development Core Team (2010), 
more specifically to an R-package robKalman, the development of which is done under 
r-forge project https://r-forge.r-project.org/projects/robkalman/, see also R- 
Forge Administration and Development Team (2008). Under this address you will also 
find a preliminary version available for download. 

7. Conclusion 

In the extremely flexible class of dynamic models consisting in SSMs, we apply (distri- 
butional) robust optimality results for filtering. We could show that contrary to common 
prejudice a simultaneous treatment of (wide-sense) IO's and AO's is possible in SSM's — 
albeit with minor delay. The filters that we propose are model based (in contrast to the 
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non-parametric hybr PRMH ) which means that we need a higher degree of model specifi- 
cation in that we possibly have to estimate the hyper-parameters, but which also could 
help to get more precise in the ideal model (in particular for higher dimensions). Our 
filters are non-iterative, recursive, hence fast, and valid for higher dimensions. They are 
available in R in some devel versions and hopefully on CRAN soon. 
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